week 7: multilevel models
multilevel adventures
mlm
Often, there are opportunities to cluster your observations – repeated measures, group membership, hierarchies, even different measures for the same particiapnt. Whenever you can cluster, you should!
Aggregation is bad
Regressions within regressions (ie coefficients as outcomes)
Questions at different levels
Variance decomposition
Learning from other data through pooling/shrinkage
Parameters that depend on parameters
Aggregation * Between person H1: Do students who study more get better grades? * Within person H2: When a student studies, do they get better grades? * H1 and H2 are independent from one another! Aggregation collapses the two. When you have nested data with many DVs it is important to not aggregate.
today’s topics
More than one clustering variable
multilevel slopes
more than one type of cluster
McElreath doesn’t cover this in his video lecture, but this is from the textbook and worth discussing.
Data from Silk et al. (2005)
From McElreath:
The data for this example come from an experiment aimed at evaluating the prosocial tendencies of chimpanzees (Pan troglodytes ). The experimental structure mimics many common experiments conducted on human students (Homo sapiens studiensis ) by economists and psychologists. A focal chimpanzee sits at one end of a long table with two levers, one on the left and one on the right in this figure. On the table are four dishes which may contain desirable food items. The two dishes on the right side of the table are attached by a mechanism to the right-hand lever. The two dishes on the left side are similarly attached to the left-hand lever.
When either the left or right lever is pulled by the focal animal, the two dishes on the same side slide towards opposite ends of the table. This delivers whatever is in those dishes to the opposite ends. In all experimental trials, both dishes on the focal animal’s side contain food items. But only one of the dishes on the other side of the table contains a food item. Therefore while both levers deliver food to the focal animal, only one of the levers delivers food to the other side of the table.
There are two experimental conditions. In the partner condition, another chimpanzee is seated at the opposite end of the table, as pictured in the figure. In the control condition, the other side of the table is empty. Finally, two counterbalancing treatments alternate which side, left or right, has a food item for the other side of the table. This helps detect any handedness preferences for individual focal animals.
When human students participate in an experiment like this, they nearly always choose the lever linked to two pieces of food, the prosocial option, but only when another student sits on the opposite side of the table. The motivating question is whether a focal chimpanzee behaves similarly, choosing the prosocial option more often when another animal is present. In terms of linear models, we want to estimate the interaction between condition (presence or absence of another animal) and option (which side is prosocial).
data (chimpanzees, package= "rethinking" )
d <- chimpanzees
rethinking:: precis (d)
mean sd 5.5% 94.5% histogram
actor 4.0000000 2.0019871 1.000 7.000 ▇▇▁▇▁▇▁▇▁▇▁▇
recipient 5.0000000 2.0039801 2.000 8.000 ▇▇▁▇▁▇▁▇▁▇▁▇
condition 0.5000000 0.5004968 0.000 1.000 ▇▁▁▁▁▁▁▁▁▇
block 3.5000000 1.7095219 1.000 6.000 ▇▇▁▇▁▇▁▇▁▇
trial 36.5000000 20.8032533 4.665 68.335 ▇▇▇▇▇▇▇▁
prosoc_left 0.5000000 0.5004968 0.000 1.000 ▇▁▁▁▁▁▁▁▁▇
chose_prosoc 0.5674603 0.4959204 0.000 1.000 ▅▁▁▁▁▁▁▁▁▇
pulled_left 0.5793651 0.4941515 0.000 1.000 ▅▁▁▁▁▁▁▁▁▇
We could model the interaction between condition (presence/absence of another animal) and option (which side is prosocial), but it is more difficult to assign sensible priors to interaction effects. Another option, because we’re working with categorical variables, is to turn our 2x2 into one variable with 4 levels.
d$ treatment <- factor (1 + d$ prosoc_left + 2 * d$ condition)
d %>% count (treatment, prosoc_left, condition)
treatment prosoc_left condition n
1 1 0 0 126
2 2 1 0 126
3 3 0 1 126
4 4 1 1 126
t_labels = c ("r/n" , "l/n" , "r/p" , "l/p" )
In this experiment, each pull is within a cluster of pulls belonging to an individual chimpanzee. But each pull is also within an experimental block, which represents a collection of observations that happened on the same day. So each observed pull belongs to both an actor (1 to 7) and a block (1 to 6). There may be unique intercepts for each actor as well as for each block.
Mathematical model:
\[\begin{align*}
L_i &\sim \text{Binomial}(1, p_i) \\
\text{logit}(p_i) &= \bar{\alpha} + \alpha_{\text{ACTOR[i]}} + \bar{\gamma} + \gamma_{\text{BLOCK[i]}} + \beta_{\text{TREATMENT[i]}} \\
\beta_j &\sim \text{Normal}(0, 0.5) \text{ , for }j=1..4\\
\alpha_j &\sim \text{Normal}(0, \sigma_{\alpha}) \text{ , for }j=1..7\\
\gamma_j &\sim \text{Normal}(0, \sigma_{\gamma}) \text{ , for }j=1..7\\
\bar{\alpha} &\sim \text{Normal}(0, 1.5) \\
\bar{\gamma} &\sim \text{Normal}(0, 1.5) \\
\sigma_{\alpha} &\sim \text{Exponential}(1) \\
\sigma_{\gamma} &\sim \text{Exponential}(1) \\
\end{align*}\]
m1 <-
brm (
family = bernoulli,
data = d,
pulled_left ~ 0 + treatment + (1 | actor) + (1 | block),
prior = c (prior (normal (0 , 1.5 ), class = b),
prior (exponential (1 ), class = sd, group = actor),
prior (exponential (1 ), class = sd, group = block)),
chains= 4 , cores= 4 , iter= 2000 , warmup= 1000 ,
seed = 1 ,
file = here ("files/models/71.1" )
)
Family: bernoulli
Links: mu = logit
Formula: pulled_left ~ 0 + treatment + (1 | actor) + (1 | block)
Data: d (Number of observations: 504)
Draws: 4 chains, each with iter = 2000; warmup = 1000; thin = 1;
total post-warmup draws = 4000
Multilevel Hyperparameters:
~actor (Number of levels: 7)
Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
sd(Intercept) 1.96 0.63 1.06 3.45 1.00 1250 1660
~block (Number of levels: 6)
Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
sd(Intercept) 0.20 0.17 0.01 0.63 1.00 1351 1690
Regression Coefficients:
Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
treatment1 0.25 0.57 -0.90 1.38 1.00 1083 1577
treatment2 0.85 0.57 -0.31 1.94 1.00 1044 1743
treatment3 -0.16 0.57 -1.33 0.94 1.00 1062 1573
treatment4 0.72 0.57 -0.46 1.81 1.00 1022 1565
Draws were sampled using sampling(NUTS). For each parameter, Bulk_ESS
and Tail_ESS are effective sample size measures, and Rhat is the potential
scale reduction factor on split chains (at convergence, Rhat = 1).
Estimate Est.Error Q2.5 Q97.5
b_treatment1 0.25346086 0.5690543 -8.956685e-01 1.3752420
b_treatment2 0.85408515 0.5716742 -3.133519e-01 1.9394030
b_treatment3 -0.16255802 0.5734419 -1.331065e+00 0.9439413
b_treatment4 0.72272758 0.5735628 -4.574810e-01 1.8120448
sd_actor__Intercept 1.95765160 0.6271925 1.058292e+00 3.4481233
sd_block__Intercept 0.20429722 0.1714939 7.619742e-03 0.6298440
r_actor[1,Intercept] -0.76806254 0.5871299 -1.884859e+00 0.4127745
r_actor[2,Intercept] 4.16869772 1.2800422 2.187441e+00 7.2504878
r_actor[3,Intercept] -1.07340338 0.5831362 -2.164617e+00 0.1178102
r_actor[4,Intercept] -1.06777362 0.5979555 -2.198872e+00 0.1664252
r_actor[5,Intercept] -0.76831595 0.5940808 -1.901701e+00 0.4779332
r_actor[6,Intercept] 0.18036694 0.5931503 -9.723677e-01 1.3304202
r_actor[7,Intercept] 1.71737307 0.6527872 4.933997e-01 3.0887395
r_block[1,Intercept] -0.15780279 0.2170111 -6.890853e-01 0.1378163
r_block[2,Intercept] 0.03741492 0.1817171 -3.388320e-01 0.4520013
r_block[3,Intercept] 0.05159167 0.1837081 -2.778782e-01 0.4987016
r_block[4,Intercept] 0.01488767 0.1817779 -3.471849e-01 0.4117812
r_block[5,Intercept] -0.02724225 0.1824159 -4.304912e-01 0.3526096
r_block[6,Intercept] 0.10603610 0.1960600 -2.018372e-01 0.5857896
lprior -8.04858139 0.8743820 -1.025541e+01 -6.8249218
lp__ -288.90426240 3.8659615 -2.973491e+02 -282.5082745
m1 %>%
mcmc_plot (variable = c ("^r_" , "^b_" , "^sd_" ), regex = T) +
theme (axis.text.y = element_text (hjust = 0 ))
Zooming in on just the actor and block effects. (Remember, these are differences from the weighted average.)
m1 %>%
mcmc_plot (variable = c ("^r_" ), regex = T) +
theme (axis.text.y = element_text (hjust = 0 ))
Code
as_draws_df (m1) %>%
select (starts_with ("sd" )) %>%
pivot_longer (everything ()) %>%
ggplot (aes (x = value, fill = name)) +
geom_density (linewidth = 0 , alpha = 3 / 4 , adjust = 2 / 3 , show.legend = F) +
annotate (geom = "text" , x = 0.67 , y = 2 , label = "block" , color = "#5e8485" ) +
annotate (geom = "text" , x = 2.725 , y = 0.5 , label = "actor" , color = "#0f393a" ) +
scale_fill_manual (values = c ("#0f393a" , "#5e8485" )) +
scale_y_continuous (NULL , breaks = NULL ) +
ggtitle (expression (sigma["group" ])) +
coord_cartesian (xlim = c (0 , 4 ))
Let’s look at the predictions by actor and block to confirm.
Code
d %>% distinct (actor, block, treatment) %>%
add_epred_draws (m1) %>%
mutate (treatment= factor (treatment,
levels= as.character (1 : 4 ),
labels= t_labels)) %>%
group_by (actor, treatment) %>%
mean_qi (.epred) %>%
ggplot ( aes (x= treatment, y= .epred, group= 1 ) ) +
geom_point () +
geom_line () +
labs (x= NULL , y= "p(pull left)" , title= "by actor" ) +
facet_wrap (~ actor)
Code
d %>% distinct (actor, block, treatment) %>%
add_epred_draws (m1) %>%
mutate (treatment= factor (treatment,
levels= as.character (1 : 4 ),
labels= t_labels)) %>%
group_by (block, treatment) %>%
mean_qi (.epred) %>%
ggplot ( aes (x= treatment, y= .epred, group= 1 ) ) +
geom_point () +
geom_line () +
labs (x= NULL , y= "p(pull left)" , title= "by block" ) +
facet_wrap (~ block)
slopes
Let’s start by simulating cafe data.
# ---- set population-level parameters -----
a <- 3.5 # average morning wait time
b <- (- 1 ) # average difference afternoon wait time
sigma_a <- 1 # std dev in intercepts
sigma_b <- 0.5 # std dev in slopes
rho <- (- 0.7 ) #correlation between intercepts and slopes
# ---- create vector of means ----
Mu <- c (a, b)
# ---- create matrix of variances and covariances ----
sigmas <- c (sigma_a,sigma_b) # standard deviations
Rho <- matrix ( c (1 ,rho,rho,1 ) , nrow= 2 ) # correlation matrix
# now matrix multiply to get covariance matrix
Sigma <- diag (sigmas) %*% Rho %*% diag (sigmas)
# ---- simulate intercepts and slopes -----
N_cafes = 20
library (MASS)
set.seed (5 )
vary_effects <- mvrnorm ( n= N_cafes, mu = Mu, Sigma= Sigma)
a_cafe <- vary_effects[, 1 ]
b_cafe <- vary_effects[, 2 ]
# ---- simulate observations -----
set.seed (22 )
N_visits <- 10
afternoon <- rep (0 : 1 ,N_visits* N_cafes/ 2 )
cafe_id <- rep ( 1 : N_cafes , each= N_visits )
mu <- a_cafe[cafe_id] + b_cafe[cafe_id]* afternoon
sigma <- 0.5 # std dev within cafes
wait <- rnorm ( N_visits* N_cafes , mu , sigma )
d <- data.frame ( cafe= cafe_id , afternoon= afternoon , wait= wait )
mean sd 5.5% 94.5% histogram
cafe 10.500000 5.7807513 2.000000 19.000000 ▇▇▇▇▇▇▇▇▇▇
afternoon 0.500000 0.5012547 0.000000 1.000000 ▇▁▁▁▁▁▁▁▁▇
wait 3.073405 1.1082252 1.477719 4.869957 ▁▁▂▅▇▇▅▇▃▃▁▁▁▁
cafe afternoon wait
1 1 0 3.967893
2 1 1 3.857198
3 1 0 4.727875
4 1 1 2.761013
5 1 0 4.119483
6 1 1 3.543652
7 1 0 4.190949
8 1 1 2.533223
9 1 0 4.124032
10 1 1 2.764887
a simulation note from RM
In this exercise, we are simulating data from a generative process and then analyzing that data with a model that reflects exactly the correct structure of that process. But in the real world, we’re never so lucky. Instead we are always forced to analyze data with a model that is MISSPECIFIED: The true data-generating process is different than the model. Simulation can be used however to explore misspecification. Just simulate data from a process and then see how a number of models, none of which match exactly the data-generating process, perform. And always remember that Bayesian inference does not depend upon data-generating assumptions, such as the likelihood, being true. Non-Bayesian approaches may depend upon sampling distributions for their inferences, but this is not the case for a Bayesian model. In a Bayesian model, a likelihood is a prior for the data, and inference about parameters can be surprisingly insensitive to its details.
Mathematical model:
likelihood function and linear model
\[\begin{align*}
W_i &\sim \text{Normal}(\mu_i, \sigma) \\
\mu_i &= \alpha_{CAFE[i]} + \beta_{CAFE[i]}(\text{Afternoon}_i)
\end{align*}\]
varying intercepts and slopes
\[\begin{align*}
\begin{bmatrix} \alpha_{CAFE[i]} \\ \beta_{CAFE[i]} \end{bmatrix} &\sim \text{MVNormal}( \begin{bmatrix} \alpha \\ \beta \end{bmatrix}, \mathbf{S}) \\
\mathbf{S} &\sim \begin{pmatrix} \sigma_{\alpha}, & 0 \\ 0, & \sigma_{\beta}\end{pmatrix}\mathbf{R}\begin{pmatrix} \sigma_{\alpha}, & 0 \\ 0, & \sigma_{\beta}\end{pmatrix} \\
\end{align*}\]
priors
\[\begin{align*}
\alpha &\sim \text{Normal}(5,2) \\
\beta &\sim \text{Normal}(-1,0.5) \\
\sigma &\sim \text{Exponential}(1) \\
\sigma_{\alpha} &\sim \text{Exponential}(1) \\
\sigma_{\beta} &\sim \text{Exponential}(1) \\
\mathbf{R} &\sim \text{LKJcorr}(2)
\end{align*}\]
LKJ correlation prior
Code
# examples
rlkj_1 = rethinking:: rlkjcorr (1e4 , K= 2 , eta= 1 )
rlkj_2 = rethinking:: rlkjcorr (1e4 , K= 2 , eta= 2 )
rlkj_4 = rethinking:: rlkjcorr (1e4 , K= 2 , eta= 4 )
rlkj_6 = rethinking:: rlkjcorr (1e4 , K= 2 , eta= 6 )
data.frame (rlkj_1= rlkj_1[,1 ,2 ],
rlkj_2= rlkj_2[,1 ,2 ],
rlkj_4= rlkj_4[,1 ,2 ],
rlkj_6= rlkj_6[,1 ,2 ]) %>%
ggplot () +
geom_density (aes (x= rlkj_1, color = "1" ), alpha= .3 ) +
geom_density (aes (x= rlkj_2, color = "2" ), alpha= .3 ) +
geom_density (aes (x= rlkj_4, color = "4" ), alpha= .3 ) +
geom_density (aes (x= rlkj_6, color = "6" ), alpha= .3 ) +
labs (x= "R" , color= "eta" ) +
theme (legend.position = "top" )
m2 <- brm (
data = d,
family = gaussian,
wait ~ 1 + afternoon + (1 + afternoon | cafe),
prior = c (
prior ( normal (5 ,2 ), class= Intercept ),
prior ( normal (- 1 , .5 ), class= b),
prior ( exponential (1 ), class= sd),
prior ( exponential (1 ), class= sigma),
prior ( lkj (2 ), class= cor)
),
iter= 2000 , warmup= 1000 , chains= 4 , cores= 4 , seed= 9 ,
file= here ("files/models/71.2" )
)
posterior_summary (m2) %>% round (2 )
Estimate Est.Error Q2.5 Q97.5
b_Intercept 3.66 0.23 3.21 4.11
b_afternoon -1.13 0.14 -1.41 -0.85
sd_cafe__Intercept 0.97 0.17 0.71 1.37
sd_cafe__afternoon 0.59 0.12 0.39 0.87
cor_cafe__Intercept__afternoon -0.51 0.18 -0.80 -0.10
sigma 0.47 0.03 0.42 0.53
Intercept 3.10 0.20 2.71 3.50
r_cafe[1,Intercept] 0.55 0.29 -0.01 1.14
r_cafe[2,Intercept] -1.50 0.30 -2.10 -0.90
r_cafe[3,Intercept] 0.70 0.30 0.12 1.30
r_cafe[4,Intercept] -0.42 0.29 -0.98 0.15
r_cafe[5,Intercept] -1.79 0.30 -2.36 -1.20
r_cafe[6,Intercept] 0.60 0.29 0.02 1.18
r_cafe[7,Intercept] -0.05 0.29 -0.64 0.55
r_cafe[8,Intercept] 0.28 0.30 -0.31 0.86
r_cafe[9,Intercept] 0.32 0.29 -0.28 0.90
r_cafe[10,Intercept] -0.10 0.29 -0.68 0.49
r_cafe[11,Intercept] -1.74 0.29 -2.31 -1.17
r_cafe[12,Intercept] 0.18 0.29 -0.40 0.76
r_cafe[13,Intercept] 0.22 0.30 -0.37 0.82
r_cafe[14,Intercept] -0.49 0.30 -1.08 0.09
r_cafe[15,Intercept] 0.79 0.30 0.22 1.39
r_cafe[16,Intercept] -0.27 0.29 -0.86 0.32
r_cafe[17,Intercept] 0.55 0.30 -0.03 1.15
r_cafe[18,Intercept] 2.08 0.30 1.52 2.68
r_cafe[19,Intercept] -0.42 0.30 -1.00 0.16
r_cafe[20,Intercept] 0.07 0.30 -0.53 0.65
r_cafe[1,afternoon] -0.03 0.29 -0.60 0.53
r_cafe[2,afternoon] 0.23 0.30 -0.36 0.79
r_cafe[3,afternoon] -0.80 0.30 -1.39 -0.24
r_cafe[4,afternoon] -0.10 0.28 -0.65 0.43
r_cafe[5,afternoon] 1.00 0.30 0.42 1.57
r_cafe[6,afternoon] -0.16 0.28 -0.72 0.38
r_cafe[7,afternoon] 0.10 0.28 -0.46 0.64
r_cafe[8,afternoon] -0.50 0.29 -1.09 0.07
r_cafe[9,afternoon] -0.17 0.28 -0.73 0.38
r_cafe[10,afternoon] 0.18 0.29 -0.39 0.75
r_cafe[11,afternoon] 0.70 0.29 0.15 1.27
r_cafe[12,afternoon] -0.06 0.29 -0.63 0.50
r_cafe[13,afternoon] -0.68 0.29 -1.25 -0.12
r_cafe[14,afternoon] 0.19 0.29 -0.36 0.78
r_cafe[15,afternoon] -1.06 0.31 -1.65 -0.44
r_cafe[16,afternoon] 0.09 0.28 -0.47 0.64
r_cafe[17,afternoon] -0.09 0.28 -0.64 0.47
r_cafe[18,afternoon] 0.10 0.30 -0.49 0.70
r_cafe[19,afternoon] 0.87 0.30 0.29 1.47
r_cafe[20,afternoon] 0.07 0.29 -0.49 0.66
lprior -5.06 0.44 -6.07 -4.36
lp__ -197.20 7.16 -212.07 -184.27
Let’s get the slopes and intercepts for each cafe.
Code
intercepts = coef (m2)$ cafe[ ,, "Intercept" ]
slopes = coef (m2)$ cafe[,, "afternoon" ]
cafe_params = data.frame (
cafe= 1 : 20 ,
intercepts= intercepts[, 1 ],
slopes= slopes[, 1 ]
)
cafe_params %>% round (3 )
cafe intercepts slopes
1 1 4.215 -1.156
2 2 2.158 -0.904
3 3 4.367 -1.928
4 4 3.243 -1.232
5 5 1.875 -0.135
6 6 4.260 -1.294
7 7 3.617 -1.027
8 8 3.945 -1.633
9 9 3.979 -1.305
10 10 3.563 -0.953
11 11 1.927 -0.430
12 12 3.841 -1.186
13 13 3.881 -1.809
14 14 3.175 -0.938
15 15 4.455 -2.192
16 16 3.390 -1.040
17 17 4.217 -1.219
18 18 5.748 -1.028
19 19 3.247 -0.260
20 20 3.729 -1.056
Code
cafe_params %>%
ggplot ( aes (x= intercepts, y= slopes) ) +
geom_point (size = 2 )
Code
cafe_params %>%
ggplot ( aes (x= intercepts, y= slopes) ) +
stat_ellipse () +
geom_point (size = 2 )
Code
cafe_params %>%
ggplot ( aes (x= intercepts, y= slopes) ) +
mapply (function (level) {
stat_ellipse (geom = "polygon" , type = "norm" ,
linewidth = 0 , alpha = .1 , fill = "#1c5253" ,
level = level)
},
# enter the levels here
level = c (1 : 9 / 10 , .99 )) +
geom_point (size = 2 )
More about stat_ellipse here .
exercise
Now use the slopes and intercepts to calculate the expected morning and afternoon wait times for each cafe. Plot these as a scatterplot. Bonus for ellipses.
Code
cafe_params %>%
mutate (
morning = intercepts,
afternoon = intercepts + slopes
) %>%
ggplot ( aes (x= morning, y= afternoon) ) +
mapply (function (level) {
stat_ellipse (geom = "polygon" , type = "norm" ,
linewidth = 0 , alpha = .1 , fill = "#1c5253" ,
level = level)
},
# enter the levels here
level = c (1 : 9 / 10 , .99 )) +
geom_point (size = 2 )+
labs (x= "morning wait time" ,
y= "afternoon wait time" )
What is the covariance of our intercepts and slopes?
Code
post = as_draws_df (m2)
rlkj_2 = rethinking:: rlkjcorr (nrow (post), K= 2 , eta= 2 )
data.frame (prior= rlkj_2[,1 ,2 ],
posterior = post$ cor_cafe__Intercept__afternoon) %>%
ggplot () +
geom_density (aes (x= prior, color = "prior" ), alpha= .3 ) +
geom_density (aes (x= posterior, color = "posterior" ), alpha= .3 ) +
scale_color_manual (values = c ("#1c5253" , "#e07a5f" )) +
labs (x= "R" ) +
theme (legend.position = "top" )
predictors at different levels
Returning to our personality data from Tuesday.
data_path = "https://raw.githubusercontent.com/sjweston/uobayes/refs/heads/main/files/data/external_data/mlm.csv"
d <- read.csv (data_path)
d %>% count (wave)
wave n
1 1 91
2 2 88
3 3 35
4 4 5
group n
1 CTRL 58
2 PD 161
Code
d %>%
ggplot (aes (x= week, y= con, group= id)) +
geom_line (alpha= .3 ) +
facet_wrap (~ group)
week is a within-person variable (Level 1), whereas group is a between-person variable (Level 2).
Mathematical model with level-2 covariate
Likelihood function and linear model
\[\begin{align*}
\text{con}_i &\sim \text{Normal}(\mu_i, \sigma) \\
\mu_i &= \alpha_{\text{id}[i]} + \beta_{\text{id}[i]}t_i
\end{align*}\]
Varying intercepts and slopes with level-2 predictor:
\[\begin{align*}
\alpha_{\text{id}[i]} &= \alpha + \gamma_\alpha G_{\text{id}[i]} + u_{\alpha,\text{id}[i]} \\
\beta_{\text{id}[i]} &= \beta + \gamma_\beta G_{\text{id}[i]} + u_{\beta,\text{id}[i]}
\end{align*}\]
Random effects:
\[\begin{align*}
\begin{bmatrix} u_{\alpha,\text{id}[i]} \\ u_{\beta,\text{id}[i]} \end{bmatrix} &\sim \text{MVNormal}\left(\begin{bmatrix} 0 \\ 0 \end{bmatrix}, \mathbf{S}\right) \\
\mathbf{S} &= \begin{pmatrix} \sigma_{\alpha} & 0 \\ 0 & \sigma_{\beta}\end{pmatrix}\mathbf{R}\begin{pmatrix} \sigma_{\alpha} & 0 \\ 0 & \sigma_{\beta}\end{pmatrix}
\end{align*}\]
Priors: \[\begin{align*}
\alpha &\sim \text{Normal}(0,1) \\
\beta &\sim \text{Normal}(0,1) \\
\gamma_\alpha &\sim \text{Normal}(0,1) \\
\gamma_\beta &\sim \text{Normal}(0,1) \\
\sigma &\sim \text{Exponential}(1) \\
\sigma_{\alpha} &\sim \text{Exponential}(1) \\
\sigma_{\beta} &\sim \text{Exponential}(1) \\
\mathbf{R} &\sim \text{LKJcorr}(2)
\end{align*}\]
m3 <- brm (
data= d,
family= gaussian,
con ~ 1 + week + group + week: group + (1 + week | id),
prior = c ( prior ( normal (0 ,1 ), class= Intercept ),
prior ( normal (0 ,1 ), class= b ),
prior ( exponential (1 ), class= sigma ),
prior ( exponential (1 ), class= sd ),
prior ( lkj (2 ), class= cor)),
iter= 4000 , warmup= 1000 , seed= 9 , cores= 4 , chains= 4 ,
file= here ("files/models/71.3" )
)
Divergent transitions! What do we do?
Family: gaussian
Links: mu = identity; sigma = identity
Formula: con ~ 1 + week + group + week:group + (1 + week | id)
Data: d (Number of observations: 216)
Draws: 4 chains, each with iter = 4000; warmup = 1000; thin = 1;
total post-warmup draws = 12000
Multilevel Hyperparameters:
~id (Number of levels: 88)
Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
sd(Intercept) 0.06 0.01 0.05 0.07 1.00 4017 6305
sd(week) 0.00 0.00 0.00 0.01 1.00 1909 2432
cor(Intercept,week) -0.08 0.39 -0.75 0.72 1.00 12802 7985
Regression Coefficients:
Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
Intercept 0.20 0.01 0.17 0.23 1.00 5915 7919
week 0.00 0.00 -0.01 0.01 1.00 10943 7070
groupPD -0.01 0.02 -0.04 0.02 1.00 6071 7209
week:groupPD -0.01 0.01 -0.02 0.01 1.00 11028 6548
Further Distributional Parameters:
Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
sigma 0.05 0.00 0.04 0.05 1.00 3847 4795
Draws were sampled using sampling(NUTS). For each parameter, Bulk_ESS
and Tail_ESS are effective sample size measures, and Rhat is the potential
scale reduction factor on split chains (at convergence, Rhat = 1).
as_draws_df (m3) %>%
ggplot ( aes (x= b_Intercept, y= b_groupPD) ) +
geom_point ()
Center your variables:
d = d %>%
mutate (
across ( c (con, week),
~ .- mean (., na.rm= T),
.names= "{col}_c" ))
m3b <- brm (
data= d,
family= gaussian,
con_c ~ 1 + week_c + group + week_c: group + (1 + week_c | id),
prior = c ( prior ( normal (0 ,1 ), class= Intercept ),
prior ( normal (0 ,1 ), class= b ),
prior ( exponential (1 ), class= sigma ),
prior ( exponential (1 ), class= sd ),
prior ( lkj (2 ), class= cor)),
iter= 4000 , warmup= 1000 , seed= 9 , cores= 4 , chains= 4 ,
file= here ("files/models/71.3b" )
)
p1 = as_draws_df (m3) %>%
ggplot ( aes (x= sd_id__week,
y= cor_id__Intercept__week) ) +
geom_point ()
p2 = as_draws_df (m3b) %>%
ggplot ( aes (x= sd_id__week_c, y= cor_id__Intercept__week_c) ) +
geom_point ()
p1 + p2